Search CORE

12 research outputs found

FRUIT: Faithfully Reflecting Updated Information in Text

Author: Chang Ming-Wei
Logan IV Robert L.
Passos Alexandre
Singh Sameer
Publication venue
Publication date: 13/07/2022
Field of study

Textual knowledge bases such as Wikipedia require considerable effort to keep up to date and consistent. While automated writing assistants could potentially ease this burden, the problem of suggesting edits grounded in external knowledge has been under-explored. In this paper, we introduce the novel generation task of *faithfully reflecting updated information in text* (FRUIT) where the goal is to update an existing article given new evidence. We release the FRUIT-WIKI dataset, a collection of over 170K distantly supervised data produced from pairs of Wikipedia snapshots, along with our data generation pipeline and a gold evaluation set of 914 instances whose edits are guaranteed to be supported by the evidence. We provide benchmark results for popular generation systems as well as EDIT5 -- a T5-based approach tailored to editing we introduce that establishes the state of the art. Our analysis shows that developing models that can update articles faithfully requires new capabilities for neural generation models, and opens doors to many new applications.Comment: v2.0, NAACL 202

arXiv.org e-Print Archive

BUMP: A Benchmark of Unfaithful Minimal Pairs for Meta-Evaluation of Faithfulness Metrics

Author: Cao Shuyang
Jaimes Alejandro
Logan IV Robert L.
Lu Di
Ma Liang
Ran Shihao
Tetreault Joel
Zhang Ke
Publication venue
Publication date: 04/06/2023
Field of study

The proliferation of automatic faithfulness metrics for summarization has produced a need for benchmarks to evaluate them. While existing benchmarks measure the correlation with human judgements of faithfulness on model-generated summaries, they are insufficient for diagnosing whether metrics are: 1) consistent, i.e., indicate lower faithfulness as errors are introduced into a summary, 2) effective on human-written texts, and 3) sensitive to different error types (as summaries can contain multiple errors). To address these needs, we present a benchmark of unfaithful minimal pairs (BUMP), a dataset of 889 human-written, minimally different summary pairs, where a single error is introduced to a summary from the CNN/DailyMail dataset to produce an unfaithful summary. We find BUMP complements existing benchmarks in a number of ways: 1) the summaries in BUMP are harder to discriminate and less probable under SOTA summarization models, 2) unlike non-pair-based datasets, BUMP can be used to measure the consistency of metrics, and reveals that the most discriminative metrics tend not to be the most consistent, and 3) unlike datasets containing generated summaries with multiple errors, BUMP enables the measurement of metrics' performance on individual error types.Comment: Accepted as a long main conference paper at ACL 202

arXiv.org e-Print Archive

Propulsion Wheel Motor for an Electric Vehicle

Author: Bluethmann William J.
Eggleston IV, Raymond Edward
Farrell Logan Christopher
Figuered Joshua M.
Guo Raymond
Herrera Eduardo
Junkin Lucien Q.
Lee Chunhao J.
Rogers James Jonathan
Vitale Robert L.
Waligora Thomas M.
Weber Steven J.
Winn Ross Briant
Publication venue
Publication date
Field of study

A wheel assembly for an electric vehicle includes a wheel rim that is concentrically disposed about a central axis. A propulsion-braking module is disposed within an interior region of the wheel rim. The propulsion-braking module rotatably supports the wheel rim for rotation about the central axis. The propulsion-braking module includes a liquid cooled electric motor having a rotor rotatable about the central axis, and a stator disposed radially inside the rotor relative to the central axis. A motor-wheel interface hub is fixedly attached to the wheel rim, and is directly attached to the rotor for rotation with the rotor. The motor-wheel interface hub directly transmits torque from the electric motor to the wheel rim at a 1:1 ratio. The propulsion-braking module includes a drum brake system having an electric motor that rotates a cam device, which actuates the brake shoes

NASA Technical Reports Server

Active Bayesian Assessment for Black-Box Classifiers

Author: IV Robert L Logan
Ji Disi
Smyth Padhraic
Steyvers Mark
Publication venue: eScholarship, University of California
Publication date: 16/02/2020
Field of study

Recent advances in machine learning have led to increased deployment of black-box classifiers across a wide variety of applications. In many such situations there is a critical need to both reliably assess the performance of these pre-trained models and to perform this assessment in a label-efficient manner (given that labels may be scarce and costly to collect). In this paper, we introduce an active Bayesian approach for assessment of classifier performance to satisfy the desiderata of both reliability and label-efficiency. We begin by developing inference strategies to quantify uncertainty for common assessment metrics such as accuracy, misclassification cost, and calibration error. We then propose a general framework for active Bayesian assessment using inferred uncertainty to guide efficient selection of instances for labeling, enabling better performance assessment with fewer labels. We demonstrate significant gains from our proposed active Bayesian approach via a series of systematic empirical experiments assessing the performance of modern neural classifiers (e.g., ResNet and BERT) on several standard image and text classification datasets

arXiv.org e-Print Archive

eScholarship - University of California

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Recommended from our members

Detecting conversation topics in primary care office visits from transcripts of patient-provider interactions.

Author: Atkins David C
Imel Zac E
Karra Taniskidou Efi
Kotzias Dimitrios
Kuo Patty
Lafata Jennifer Elston
Logan Iv Robert L
Merced Kritzia
Park Jihyun
Singh Sameer
Smyth Padhraic
Tai-Seale Ming
Tanana Michael
Publication venue: eScholarship, University of California
Publication date: 01/12/2019
Field of study

ObjectiveAmid electronic health records, laboratory tests, and other technology, office-based patient and provider communication is still the heart of primary medical care. Patients typically present multiple complaints, requiring physicians to decide how to balance competing demands. How this time is allocated has implications for patient satisfaction, payments, and quality of care. We investigate the effectiveness of machine learning methods for automated annotation of medical topics in patient-provider dialog transcripts.Materials and methodsWe used dialog transcripts from 279 primary care visits to predict talk-turn topic labels. Different machine learning models were trained to operate on single or multiple local talk-turns (logistic classifiers, support vector machines, gated recurrent units) as well as sequential models that integrate information across talk-turn sequences (conditional random fields, hidden Markov models, and hierarchical gated recurrent units).ResultsEvaluation was performed using cross-validation to measure 1) classification accuracy for talk-turns and 2) precision, recall, and F1 scores at the visit level. Experimental results showed that sequential models had higher classification accuracy at the talk-turn level and higher precision at the visit level. Independent models had higher recall scores at the visit level compared with sequential models.ConclusionsIncorporating sequential information across talk-turns improves the accuracy of topic prediction in patient-provider dialog by smoothing out noisy information from talk-turns. Although the results are promising, more advanced prediction techniques and larger labeled datasets will likely be required to achieve prediction performance appropriate for real-world clinical applications

eScholarship - University of California